Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CISA Data Importer via GitHub Repo- Added #1614

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Rishi-source
Copy link

Add CISA GOV Vulnrichment Importer

This pull request adds a new importer for the CISA GOV Vulnrichment dataset. The importer fetches vulnerability data from the CISAGOV/vulnrichment GitHub repository and imports it into our database.

Related Issue

Closes #1475

Changes

  • Added a new VulnrichImporter class in vulnerabilities/importers/cisagov.py
  • Implemented methods to fetch and parse advisory data from the CISAGOV GitHub repository
  • Added deduplication logic to prevent unnecessary updates to existing records
  • Integrated the new importer with the existing import system

How to Use

To use the new importer, run the following management command:

python manage.py import vulnerabilities.importers.vulnrichment.VulnrichImporter

This command will fetch the latest data from the CISAGOV/vulnrichment repository and import it into the database.

Features

  • Fetches vulnerability data from the CISAGOV/vulnrichment GitHub repository
  • Parses CVE data, including CVE ID, summary, references, and weaknesses
  • Extracts severity scores
  • Implements content-based deduplication to avoid unnecessary updates
  • Logs skipped advisories for transparency

Testing

  • Tested the importer with a sample dataset from the CISAGOV repository
  • Verified that duplicate entries are not created when running the importer multiple times
  • Checked that updates are only applied when the advisory content has changed

Additional Notes

  • The importer uses the GitHub API to fetch repository content. Ensure that the necessary API rate limits are considered for production use.
  • The GITHUB_API_BASE, REPO_OWNER, REPO_NAME, and BRANCH constants in the importer can be adjusted if the source repository changes in the future.

Please review and let me know if any changes or additional information is needed.

@pombredanne
Copy link
Member

@keshav-space should we now instead use the new importer pipeline approach?

@keshav-space
Copy link
Member

@keshav-space should we now instead use the new importer pipeline approach?

@pombredanne yes, we have a decent number of pipelines here https://github.com/aboutcode-org/vulnerablecode/tree/main/vulnerabilities/pipelines, and there is a brief instruction on how to write a pipeline here #1589 (comment). I still need to add this to our tutorials in Read the Docs.

@Rishi-source
Copy link
Author

@keshav-space Can you please tell me that what is the difference between the importer pipeline approach and the normal importing?

@Rishi-source Rishi-source marked this pull request as draft October 15, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect https://github.com/cisagov/vulnrichment
4 participants